SOFA: An Extensible Logical Optimizer for UDF-heavy Dataflows

نویسندگان

  • Astrid Rheinländer
  • Arvid Heise
  • Fabian Hueske
  • Ulf Leser
  • Felix Naumann
چکیده

Recent years have seen an increased interest in large-scale analytical dataflows on non-relational data. These dataflows are compiled into execution graphs scheduled on large compute clusters. In many novel application areas the predominant building blocks of such dataflows are user-defined predicates or functions (Udfs). However, the heavy use of Udfs is not well taken into account for dataflow optimization in current systems. Sofa is a novel and extensible optimizer for Udf-heavy dataflows. It builds on a concise set of properties for describing the semantics of Map/Reduce-style Udfs and a small set of rewrite rules, which use these properties to find a much larger number of semantically equivalent plan rewrites than possible with traditional techniques. A salient feature of our approach is extensibility: We arrange user-defined operators and their properties into a subsumption hierarchy, which considerably eases integration and optimization of new operators. We evaluate Sofa on a selection of Udf-heavy dataflows from different domains and compare its performance to three other algorithms for dataflow optimization. Our experiments reveal that Sofa finds efficient plans, outperforming the best plans found by its competitors by a factor of up to 6.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Volcano Optimizer Generator: Extensibility and Efficient Search

Emerging database application domains demand not only new functionality but also high performance. To satisfy these two requirements, the Volcano project provides efficient, extensible tools for query and request processing, particularly for object-oriented and scientific database systems. One of these tools is a new optimizer generator. Data model, logical algebra, physical algebra, and optimi...

متن کامل

Regression-Based Self-Tuning Modeling of Smooth User-Defined Function Costs for an Object-Relational Database Management System Query Optimizer

We present a new approach to modeling the execution costs of user-defined functions (UDFs) for the query optimizer of an object-relational DBMS (ORDBMS). Our approach self-tunes a cost model incrementally based on the costs of the recent executions of a UDF. The approach is centered on a feedback loop in which the feedback information comprises individual UDF execution records. Each execution r...

متن کامل

Control of an Extensible Query Optimizer: A Planning-Based Approach

III this paper we address the problem of controlling the execution of a query optimizer. We describe a control for the optimization process that is based on planning. The controller described here is a goal-directed planner that intermingles planning with the execution of query transformations, and uses execution results to direct further planning of optimizer processing. We describe this contr...

متن کامل

Plan-Per-Tuple Optimization Solution - Parallel Execution of Expensive User-Defined Functions

Object-Relational database systems allow users to define new user-defined types and functions. This presents new optimizer and run-time challenges to the database system on shared-nothing architectures. In this paper, we describe a new strategy we are exploring for the NCR Teradata Multimedia Database System; our focus is directing research for real applications we are seeing. In doing so, we w...

متن کامل

Spinning Fast Iterative Data Flows

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk iterative algorithms are supported by novel dataflow frameworks, these systems cannot exploit computational dependencies present in many algorithms, such as grap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Syst.

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2015